Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset

نویسندگان

  • Wei Luo
  • Thin Nguyen
  • Melanie Nichols
  • Truyen Tran
  • Santu Rana
  • Sunil Gupta
  • Dinh Phung
  • Svetha Venkatesh
  • Steve Allender
چکیده

For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Forecasting Stock Price Movements Based on Opinion Mining and Sentiment Analysis: An Application of Support Vector Machine and Twitter Data

Today, social networks are fast and dynamic communication intermediaries that are a vital business tool. This study aims at examining the views of those involved with Facebook stocks so that we can summarize their views to predict the general behavior of this stock and collectively consider possible Facebook stock price movements, and create a more accurate pattern compared to previous patterns...

متن کامل

Hypertension Prediction in Primary School Students Using an Ensemble Machine Learning Method

Introduction: The prevalence of hypertension in children is increasing, and this complication is considered the most important risk factor for cardiovascular diseases in older age. Early detection and control of hypertension can prevent its progress and reduce its consequences. Machine learning methods can help predict this complication promptly and reduce cost and time. This study aimed to pro...

متن کامل

Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification

Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...

متن کامل

Hypertension Prediction in Primary School Students Using an Ensemble Machine Learning Method

Introduction: The prevalence of hypertension in children is increasing, and this complication is considered the most important risk factor for cardiovascular diseases in older age. Early detection and control of hypertension can prevent its progress and reduce its consequences. Machine learning methods can help predict this complication promptly and reduce cost and time. This study aimed to pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2015